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DETAILED ACTION 

Claim Rejections - 35 USC § 102 

1 . The following is a quotation of the appropriate paragraphs of 35 U.S. C. 102 that form the 
basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public use or on 
sale in this country, more than one year prior to the date of application for patent in the United States. 

2. Claims 3 and 4 are rejected under 35 U.S.C. 102(b) as being anticipated by Kitaoka et al. 
(hereinafter "Kitaoka"), US Patent App. Pub. 2002/0010579. 

Regarding claim 3 , Kitaoka teaches a voice recognition index-searching device 
comprising: a similar- word indexer that stores relationships between a representative word, 
selected from each of word groups generated in advance by categorizing a plurality of words into 
groups in which words resemble in pronunciation, and its group (paragraph 32 teaches "the 
speech recognition apparatus generates and stores the similar sound group of the specific word 
beforehand. . .similar sound group includes reference patterns corresponding to sounds which are 
different from but similar to that of the specific word. . precognition of the speech signal is 
performed by using the similar sound group of the specific word"); and 

a searching device that searches for similar words within a group, said searching device 
that collates a sound feature vector for the representative word for each group stored in the 
similar-word indexer against a given sound feature vector to calculate respective acoustic 
likelihoods, and collating a sound feature vector for each word in that group whose 
representative word has an acoustic likelihood, among the calculated results, not less than a 
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predetermined threshold, against the given sound feature vector to calculate respective acoustic 
likelihoods, and outputting the word having the greatest acoustic likelihood (paragraph 30, 
teaches "pattern matching section performs pattern matching between each of the reference 
patterns in a vocabulary stored in the dictionary section and the time-series data of the LPC 
cepstrum coefficients. . .similarity. . .likelihood ratio. . .between each of the reference patterns and 
each of the segments is computed"; paragraph 3 1 teaches, "pattern matching section selects as 
candidate words one or more words corresponding to the reference patterns which have high 
similarities with the LPC cepstrum coefficients"; and paragraph 45 teaches, "probability that the 
input speech signal actually represents the specific word. . .pattern matching section outputs a 
candidate word other than the specific word as the result of the recognition, if the received 
absolute level of confidence is equal to or lower than a predetermined reference level. . .reference 
level is experimentally determined beforehand"). 

Regarding claim 4 , Kitaoka teaches a voice recognition index generator comprising: a 
representative word selector that selects single word as a representative word from an original set 
composed of a plurality of words and an acoustically similar word grouper that extracts from the 
original set, a word in which the acoustic likelihood between a sound feature vector for the word 
and a sound feature vector for the representative word is not less than a predetermined threshold, 
and including the extracted word in a same group as the representative word (Kitaoka teaches at 
paragraph 32, "the speech recognition apparatus generates and stores the similar sound group of 
the specific word beforehand. . .similar sound group includes reference patterns corresponding to 
sounds which are different from but similar to that of the specific word. . .rerecognition of the 



Application/Control Number: 1 0/5 1 0,209 Page 4 

Art Unit: 2626 

speech signal is performed by using the similar sound group of the specific word"; paragraph 30, 
teaches "pattern matching section performs pattern matching between each of the reference 
patterns in a vocabulary stored in the dictionary section and the time-series data of the LPC 
cepstrum coefficients. . .similarity. . .likelihood ratio. . .between each of the reference patterns and 
each of the segments is computed."; paragraph 31 teaches, "pattern matching section selects as 
candidate words one or more words corresponding to the reference patterns which have high 
similarities with the LPC cepstrum coefficients"; and paragraph 45 teaches, "probability that the 
input speech signal actually represents the specific word. . .pattern matching section outputs a 
candidate word other than the specific word as the result of the recognition, if the received 
absolute level of confidence is equal to or lower than a predetermined reference level. . .reference 
level is experimentally determined beforehand"); and 

an original-set replacer that passes to the representative word selector the word set left by 
removing from the original set the word affiliated by the group, as another original set to be 
processed by the representative word selector (paragraph 33 teaches "apparatus further generates 
reference patterns corresponding to sounds similar to that of a second specific word. . .second 
specific word is a word which means the opposite to the specific word. . .generated reference 
patterns are added to the similar sound group"). 

Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
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having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

4. Claims 1 , 5 and 7 are rejected under 35 U.S.C. 1 03(a) as being unpatentable over Kitaoka 
et al. in view of Khan et al. (hereinafter "Khan"), US Patent App. Pub. 2002/01 1 1810. 

Regarding claim 1 , Kitaoka teaches a voice recognition device for a car navigation 
system, comprising: 

a sound analyzer that acoustically analyzes a user's vocal utterance inputted by a voice 
input means and for outputting a feature vector for the input sound (paragraph 28 teaches an 
acoustic analysis section, and paragraph 29 teaches a feature extraction section); 

an acoustic-model storage that stores in advance respective acoustic models for 
predetermined sound units, either a syllable or a phoneme being deemed a sound unit (paragraph 
30 teaches, "pattern matching between each of reference patterns in a vocabulary stored in the 
dictionary section and time-series data of the LPC cepstrum coefficients"); 

a sound-unit recognizer that checks the input-sound feature vector against the acoustic 
models to output a correlated sound-unit recognition candidate string (paragraphs 30-31, "the 
time-series data is divided into segments by using hidden Markov models and the similarity (i.e., 
likelihood ratio) between each of the reference patterns and each of the segments is 
computed. . . [e]ach of the reference patterns is a time-series of LPC cepstrum coefficients which 
are computed beforehand and correspond to one of words which should be identified"); 

Kitaoka does not explicitly teach, but Khan suggests, a word-and-position-information 
registration unit that correlates and registers in a word-and-position-information correlation 
dictionary the sound-unit recognition candidate string and position information acquired from a 
main unit of the car navigation system (Khan, Abstract, teaches a "navigation system includes an 
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automatic speech recognition program that matches spoken words that describes geographic 
features.. .to entries in a word list... geographic features closest to a certain position of a vehicle in 
which the navigation system is installed. ..[a]s the vehicle travels through a geographic area, the 
word list is rebuilt to include entries that correspond to the named geographic features closest to 
the new current vehicle position"; paragraph 51 teaches "name pronunciation data associated 
with those represented features that are closest to the current vehicle position"). 

Kitaoka in combination with Khan teaches, a position-information searcher/outputter that 
calculates acoustic likelihoods by collating the input-sound feature vector outputted by the sound 
analyzer, against sound feature vectors for the sound-unit recognition candidate strings in the 
word-and-position-information correlation dictionary, and outputting, to the car navigation main 
unit, position information associated with that sound-unit recognition candidate string whose 
calculated acoustic likelihood is not less than a predetermined threshold (Kitaoka teaches at 
paragraph 30, teaches "pattern matching section performs pattern matching between each of the 
reference patterns in a vocabulary stored in the dictionary section and the time-series data of the 
LPC cepstrum coefficients . . . similarity . . . likelihood ratio . . . between each of the reference patterns 
and each of the segments is computed"; paragraph 3 1 teaches, "pattern matching section selects 
as candidate words one or more words corresponding to the reference patterns which have high 
similarities with the LPC cepstrum coefficients"; and paragraph 45 teaches, "probability that the 
input speech signal actually represents the specific word. . .pattern matching section outputs a 
candidate word other than the specific word as the result of the recognition, if the received 
absolute level of confidence is equal to or lower than a predetermined reference level. . .reference 
level is experimentally determined beforehand"; Khan teaches word-and-position information 
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and at paragraph 53, teaches "active word list that includes entries for named geographic features 
that are close to the vehicle position.. .active word list.. .have a plurality of entries... [e]ach entry 
represents the phonetic pronunciation of the name of a particular represented geographic 
feature"; paragraph 58 teaches, "the geographic database is organized in a manner that facilitates 
finding the name pronunciation data for geographic features spatially... facilitate identifying name 
pronunciation data for geographic locations based upon the proximity of the geographic data 
from a selectable position"; paragraph 69, "name pronunciation data in the active word 
list... available for use by the automatic speech recognition program... threshold 
monitor. ..obtaining a new vehicle position. ..active word list"). 

It would have been obvious for one of ordinary skill in the art to combine the teaching 
elements of Kitaoka and Khan to include word-and-position information because Khan teaches 
his method has several advantages including "improved performance (as measured by reduced 
processing time and reduced memory requirements) of ASR algorithms operating in an in- 
vehicle environment" (paragraph 84). 

Regarding claim 5 , Kitaoka does not, but Khan suggests wherein the position-information 
searcher/outputter includes a voice recognition index-searching device, and uses the voice 
recognition index-searching device to search for and output words, their pronunciations, and 
position information stored in the word-and-position-information correlation dictionary or an 
external storage device (paragraph 53, teaches "active word list that includes entries for named 
geographic features that are close to the vehicle position... active word list.. .have a plurality of 
entries. ..[e]ach entry represents the phonetic pronunciation of the name of a particular 
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represented geographic feature"; paragraph 58 teaches, "the geographic database is organized in 
a manner that facilitates finding the name pronunciation data for geographic features 
spatially... facilitate identifying name pronunciation data for geographic locations based upon the 
proximity of the geographic data from a selectable position"; paragraph 69, "name pronunciation 
data in the active word list.. .available for use by the automatic speech recognition 
program... threshold monitor. . .obtaining a new vehicle position... active word list"). 

It would have been obvious for one of ordinary skill in the art to combine the teaching 
elements of Kitaoka and Khan to include word-and-position information because Khan teaches 
his method has several advantages including "improved performance (as measured by reduced 
processing time and reduced memory requirements) of ASR algorithms operating in an in- 
vehicle environment" (paragraph 84). 

Regarding claim 7 , Kitaoka teaches a car navigation system comprising: 
a current position detector (paragraph 20, position detection unit); 
a map data storage (paragraph 21, map data input unit); 
an image display (paragraph 23, display unit); 

a graphical pointer (paragraph 22 teaches, "control switches. . .mechanical switches. . . 
remote-control terminal"; paragraph 23 teaches "pointers which indicate the present position or 
traveling direction of the vehicle)"; and 

a destination input device (paragraph 22, control switches). 

The rest of the limitations of claim 7 are the same as or similar to those of claim 1, 
rejected above, and thus are rejected for the same reasons. 
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5. Claims 2 and 6 are rejected under 35 U.S.C. 103(a) as being unpatentable over Kitaoka et 
al. in view of Khan et al , and further in view of Ittycheriah et al. (hereinafter "Ittycheriah"), US 
Patent 6,192,337. 

Regarding claim 2 , Kitaoka and Khan do not explicitly teach, but Ittycheriah teaches: a 
confused-sound-unit matrix storage that stores in advance respective probabilities that an actual 
sound unit uttered by a human being will be recognized as a different recognition result as a 
consequence of the recognition precision of the sound analysis means, for each of recognition- 
result sound units (col. 8, 11. 49-67, teaches "distance measures calculated by the rejection 
processor for the comparisons between the newly uttered word and the existing words are 
preferably tabulated. . .tabular format may be organized in ranks based on an acoustic 
confusability threshold value... threshold value is set.. .any new word which results in a distance 
measure or score falling at or below the threshold value results in the newly uttered word being 
identified as likely to cause confusion with the associated existing word"); and 

a word developer that outputs a candidate resembling the sound-unit recognition 
candidate string by replacing each sound unit in the sound-unit recognition candidate string 
outputted by the sound-unit recognition, with a recognition-result sound unit in which the 
probability that the confused-sound-unit matrix storage has stored for that sound unit is not less 
than a predetermined threshold (col. 8, 11. 49-67, "if the newly uttered word results in a distance 
measure falling above the threshold value, then the new word is identified as not likely to cause 
confusion with the associated existing word; col. 7, 11. 38-51, teaches "labeler outputs the 
symbols which comprise the predicted baseform...a leaf sequence corresponding to the predicted 
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baseform is formed for the word uttered by the user"; col. 6, 11. 53-56, teaches "baseform and leaf 
sequences. ..baseform of a word is a sequence of phonetic units (e.g., phones) that make up the 
word"). 

It would have been obvious for one of ordinary skill in the art at the time the invention 
was made to combine the teaching elements of Kitaoka and Khan with Ittycheriah to include a 
confused-sound-matrix because Ittycheriah teaches large vocabulary poses a problem to a user 
when a word is too similar to another one such that the speech recognizer is much less accurate 
on these words, if they appeared on the same list; a confused-sound-matrix would assist in 
handling this problem. 

Kitaoka does not, but Khan suggests wherein the word-and-position-information 
registration correlates the resembling candidate to the position information acquired from the car 
navigation system main unit and registers this information in the word-and-position-information 
correlation dictionary (Khan teaches word-and-position information and at paragraph 48, teaches 
"threshold monitor routine obtains data indicating the current vehicle position. . .data indicating 
the current vehicle position may include the geographic coordinates of the vehicle position or 
alternatively, the data indicating the current vehicle position may be referenced to the map data 
contained in the geographic database that represent the road network"; paragraph 42 teaches 
"automatic speech recognition program matches the data representation of spoken words to one 
or more entries in an active word list (or dictionary). . .performing. . .matching"). 

It would have been obvious for one of ordinary skill in the art to combine the teaching 
elements of Kitaoka and Khan to include word-and-position information because Khan teaches 
his method has several advantages including "improved performance (as measured by reduced 
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processing time and reduced memory requirements) of ASR algorithms operating in an in- 
vehicle environment" (paragraph 84). 

Regarding claim 6 , Kitaoka and Khan do not explicitly teach, but Ittycheriah suggests, 
wherein a word developer developing means extracts a probability stored in a confused-sound- 
unit matrix storage for each sound unit of the resembling candidate, and outputs a probability list 
for the resembling candidate (col. 7, line 62 - col. 8, line 9, teaches "comparing the newly 
uttered word to all existing vocabulary words to determine potential acoustic 
confusability... calculating respective distance measure or scores therebetween"). 

Kitaoka in combination with Khan and Ittycheriah suggests wherein the word-and- 
position-information registration unit correlates and registers in the word-and-position- 
information correlation dictionary both the probability list and the similar candidate with the 
position information (Kitaoka teaches at paragraph 32, "the speech recognition apparatus 
generates and stores the similar sound group of the specific word beforehand. . .similar sound 
group includes reference patterns corresponding to sounds which are different from but similar to 
that of the specific word. . .rerecognition of the speech signal is performed by using the similar 
sound group of the specific word"; Khan teaches word-and-position information and at 
paragraph 53, 58 and 69, as discussed above); and 

wherein the position-information searcher/outputter, after reading a resembling word 
candidate stored in the word-and-position-information correlation dictionary and the probability 
list for that resembling word, and if the probability in its probability list is not less than a 
predetermined threshold, calculates the acoustic likelihood by checking the input-sound feature 
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vector against the sound feature vector outputted by a sound feature vector generator and outputs 
the sound-unit recognition candidate string whose acoustic likelihood is not less than the 
predetermined threshold, and if the probability in the probability list is less than the 
predetermined threshold, the position-information searcher/outputter uses the voice recognition 
index-searching device to search for words, their pronunciations, and position information stored 
in the external storage device (Kitaoka teaches at paragraph 30, teaches "pattern matching 
section performs pattern matching between each of the reference patterns in a vocabulary stored 
in the dictionary section and the time-series data of the LPC cepstrum 

coefficients. . .similarity. . .likelihood ratio. . .between each of the reference patterns and each of 
the segments is computed."; paragraph 31 teaches, "pattern matching section selects as candidate 
words one or more words corresponding to the reference patterns which have high similarities 
with the LPC cepstrum coefficients"; and paragraph 45 teaches, "probability that the input 
speech signal actually represents the specific word... pattern matching section outputs a 
candidate word other than the specific word as the result of the recognition, if the received 
absolute level of confidence is equal to or lower than a predetermined reference level. . .reference 
level is experimentally determined beforehand"; Khan teaches word-and-position information 
and at paragraph 53, 58 and 69, as discussed above). 

It would have been obvious for one of ordinary skill in the art to combine the teaching 
elements of Kitaoka and Khan to include word-and-position information because Khan teaches 
his method has several advantages including "improved performance (as measured by reduced 
processing time and reduced memory requirements) of ASR algorithms operating in an in- 
vehicle environment" (paragraph 84). 
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Conclusion 

6. The prior art made of record and not relied upon is considered pertinent to applicant's 
disclosure. Seto et al. (US Patent 6,999,874) teaches a navigation device and related method. 
Ishii et al. (US Patent 6,067,521) teaches an interrupt correction of speech recognition for a 
navigation device; an input audio signal or vocalized speech undergoes speech processing to 
determine and recognize the region specified in the speech; data corresponding to the specified 
region is converted to coordinate position data for the region, and a map of the vicinity of the 
converted coordinate position data is displayed. 

7. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Eunice Ng whose telephone number is 571-272-2854. The 
examiner can normally be reached on Monday through Friday, 8:30 a.m. - 5:00 p.m. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David Hudspeth can be reached on 571-272-7843. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would 
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like assistance from a USPTO Customer Service Representative or access to the automated 
information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 



/E. N./ 

Examiner, Art Unit 2626 



/David R Hudspeth/ 

Supervisory Patent Examiner, Art Unit 2626 



